Parameterization of vocal fry in HMM-based speech synthesis
نویسندگان
چکیده
HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, sometimes databases contain certain inherent voice qualities that need to be parametrized properly. One example of this is vocal fry typically occurring at the end of utterances. A popular mixed excitation vocoder for HMM-based speech synthesis is STRAIGHT. The standard STRAIGHT is optimized for modal voices and may not produce high quality with other voice types. Fortunately, due to the flexibility of STRAIGHT, different F0 and aperiodicity measures can be used in the synthesis without any inherent degradations in speech quality. We have replaced the STRAIGHT excitation with a representation based on a robust F0 measure and a carefully determined two-band voicing. According to our analysis-synthesis experiments, the new parameterization can improve the speech quality. In HMM-based speech synthesis, the quality is significantly improved especially due to the better modeling of vocal fry.
منابع مشابه
HMM-based synthesis of creaky voice
Creaky voice, also referred to as vocal fry, is a voice quality frequently produced in many languages, in both read and conversational speech. To enhance the naturalness of speech synthesis, these latter should be able to generate speech in all its expressive diversity, including creaky voice. The present study looks to exploit our recent developments, including creaky voice detection, predicti...
متن کاملSpeech enhancement based on hidden Markov model using sparse code shrinkage
This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...
متن کاملAdvances in Spectral Parameterization for Statistical (HMM-Based) TTS
HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the ...
متن کاملThe GlottHMM Speech Synthesis Entry for Blizzard Challenge 2010
This paper describes the GlottHMM speech synthesis entry for Blizzard Challenge 2010. GlottHMM is a hidden Markov model (HMM) based speech synthesis system that utilizes glottal inverse filtering for separating the vocal tract from the glottal source. The source and the filter characteristics are modeled separately in the framework of HMM. In the synthesis stage, natural glottal flow pulses are...
متن کاملF0 parameterization of glottalized tones for HMM-based vietnamese TTS
A conventional HMM-based TTS system for Hanoi Vietnamese often suffers from the hoarse quality due to the incomplete F0 parameterization of glottalized tones. As estimating F0 in glottalization is rather problematic for usual F0 extractors, we propose a pitch marking algorithm where the pitch marks are propagated from regular regions of speech signal to glottalized one, from which the complete ...
متن کامل